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ABSTRACT 

Bats harbor a large diversity of coronaviruses (CoVs), several of which are related to 
zoonotic pathogens that cause severe disease in humans. Our screening of bat samples 
collected in Kenya during 2007-2010 not only detected RNA from several novel CoVs but, 
more significantly, identified sequences that were closely related to human CoVs NL63 and 
229E, suggesting that these two human viruses originate from bats. We also demonstrated 
that human CoV NL63 is a recombinant between NL63-like viruses circulating in Triaenops 
bats and 229E-like viruses circulating in Hipposideros bats, with the break-point located near 
5’ and 3’ end of the spike (S) protein gene. In addition, two further inter-species 
recombination events involving the S gene were identified, suggesting that this region may 
represent a recombination “hotspot” in CoV genomes. Finally, using a combination of 
phylogenetic and distance-based approaches we showed that genetic diversity of bat CoVs is 


primarily structured by host species and subsequently by geographic distances. 


IMPORTANCE 

Understanding the driving forces of cross-species virus transmission is central to 
understanding the nature of disease emergence. Previous studies have demonstrated that bats 
are the ultimate reservoir hosts for a number of coronaviruses (CoVs) including ancestors of 
SARS-CoV, MERS-CoV, and HCoV-229E. However, the evolutionary pathways of bat 
CoVs remain elusive. We provide evidence for natural recombination between distantly- 
related African bat coronaviruses associated with Triaenops afer and Hipposideros sp. bats 
that resulted in a NL-63 like virus, an ancestor of the human pathogen HCoV-NL63. These 
results suggest that inter-species recombination may play an important role in CoV evolution 


and the emergence of novel CoVs with zoonotic potential. 
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INTRODUCTION 
Coronaviruses (CoVs) (subfamily Coronavirinae, family Coronaviridae, order Nidovirales) 
are common infectious agents that infect a wide range of hosts including humans, causing 
respiratory, gastrointestinal, liver, and neurologic diseases, and that possess the largest 
genomes of any RNA viruses described to date (1). The subfamily Coronavirinae is currently 
classified into four genera: A/phacoronavirus, Betacoronavirus, Gammacoronavirus, and 
Deltacoronavirus (2). The alphacoronaviruses (alpha-CoV) and betacoronaviruses (beta-CoV) 
are exclusively found in mammals while the gammacoronaviruses (gamma-CoV) and 
deltacoronaviruses (delta-CoV) are mainly associated with birds. Presently, the greatest 
diversity of alpha- and beta-CoVs has been documented in bats, which in part reflects the 
more intensive surveillance of these animals since Rhinolophus spp. bats were implicated as 
the reservoir hosts for SARS-related CoVs (3, 4). This surveillance resulted in the discovery 
of a potential reservoir host (bat) species for another two human CoVs: Human CoV 229E 
(HCoV-229E), a relative of which is present in Hipposideros bats (5, 6), and Middle East 
respiratory syndrome coronavirus (MERS-CoV), for which related viruses are present in 
Pipistrellus, Tylonycteris, and Neoromicia bats (7-10), although the most likely reservoir host 
of human MERS-CoV identified to date is the dromedary camel (11). Most recently HCoV- 
229E-like CoVs were also identified in camels, although their role in human infection is 
unknown (12). 

Africa is a major hotspot of zoonotic emerging diseases. With its rich biodiversity, 
Africa is inhabited by many bats of different species including those that serve as reservoirs 
of important zoonotic diseases such as Marburg hemorrhagic fever and rabies (13). Our initial 
screening demonstrated the presence of diverse CoVs in African bats, including those 
collected in the southern parts of Kenya during 2006 (14, 15), and in other countries 


including South Africa, Nigeria, and Ghana (16). Furthermore, recent studies have provided 
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strong evidence that HCoV-229E originated from bat viruses circulating in Africa (5), 


underscoring the zoonotic potential of bat-borne CoVs from this continent. 


One human coronavirus, HCoV-NL63, was first isolated in 2004 from the aspirate of 
a 8-month-old boy suffering from pneumonia in the Netherlands (17). While the clinical 
significance of this virus is debated, it has a worldwide distribution and is known to infect 
both the upper and lower respiratory tract (18). Based on a phylogeny of the RNA-dependent 
RNA polymerase (RdRp), HCoV-NL63 is related to another human virus HCoV-229E and 
had no close relatives identified in bats (16). Although Huynh et al. (19) suggested that a 
virus (ARCoV.2/2010/USA) isolated from the American tricolored bat (Perimyotis subflavus) 
may share common ancestry with HCoV-NL63, the genetic distance between the two viruses 
is large, and their close relationship has not been corroborated in other phylogenetic analyses 
(16, 20). Nevertheless, the successful passage of HCoV-NL63 in an immortalized bat cell 
line suggests its potential association with bats (19). 

As is well appreciated, recombination leads to rapid changes of genetic diversity in 
RNA viruses (21). CoVs represent a classic example of viruses with high frequencies of 
homologous recombination through discontinuous RNA synthesis (22). Indeed, under 
experimental conditions, the recombination frequency can be as high as 25% for the entire 
CoV genome (23). Recombination in CoVs is also frequently reported under natural 
conditions, including some emerging human pathogens such as SARS-CoV (24, 25), MERS- 
CoV (11), HCoV-OC43 (26), and HCoV-NL63 (27), although most reports are between 
closely related viruses. 

The Global Disease Detection Program (GDD) of the Centers for Disease Control and 
Prevention (CDC, Atlanta, GA) is focused on the detection of emerging infectious agents 
worldwide. One of the GDD projects was directed toward the detection of such potential 


zoonotic pathogens in African bats. Since the initial study performed during 2006 in Kenya 
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(14, 15), an expanded surveillance of bat CoVs has been performed in the same and other 
countries including Kenya, Nigeria, Democratic Republic of Georgia, Democratic Republic 
of Congo, Guatemala, and Peru. The project included more bat species and geographic 
locations, allowing a more thorough investigation of the genetic diversity and ecological 
dynamics of CoVs circulation in bats. In this study, we performed an ecological and 
evolutionary characterization of CoVs circulating in Kenya and identified distinct CoVs from 
Triaenops afer and Hipposideros sp. bats that are phylogenetically related to HCoV-NL63 in 
different parts of the genome. Based on this data, we propose a scenario for the origin and 


evolutionary history of HCoV-NL63 and related viruses. 
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MATERIALS AND METHODS 

Sample collection. Between 2007 and 2010 a total of 2050 bat specimens were collected 
from 30 different locations in Kenya (Table S1) in collaboration with the CDC GDD regional 
country office in Kenya and National Museums of Kenya. The bats were captured using mist- 
nets, hand nets or manually. The protocol (2096FRAMULX-A3) was approved by the CDC 
IACUC and by Kenya Wildlife Services. Upon capture, each bat was measured, sexed and 
identified to species by a trained field biologist. Subsequently, fecal and oral swabs (if 
possible) were collected in compliance with field protocol and were then transported on dry 


ice from the field to -80°C storage before further processing. 


CoV RNA detection. Each fecal and oral swab was suspended in 200 uL of a phosphate 
buffered saline. Viral total nucleic acids (TNA) were extracted using the QlAamp Mini Viral 
Spin kit (Qiagen, Valencia, CA, USA) according to the manufacturer’s instructions, followed 
by semi-nested RT-PCR (SuperScript III One-Step RT-PCR kit and Platinum Taq kit, 
Invitrogen, San Diego, CA, USA) using primer sets designed to target the conserved genome 
region of alpha-, beta-, gamma- and delta-CoVs, respectively (15). PCR products of the 
expected size (~ 400 nucleotides) were purified by gel extraction using the QIAquick Gel 
Extraction kit (Qiagen, Valencia, CA, USA) and sequenced in both directions on an ABI 
Prism 3130 automated sequencer (Applied Biosystems, Foster City, CA, USA). As 


validation, the RT-PCR procedure was repeated for each of the CoV positive specimens. 


Bat mitochondrial gene sequencing. Bat species were further confirmed by sequencing the 
host mitochondrial cytochrome b (cytB) gene in each of the CoV-positive specimens. Both 
the method and the primers used have been described previously, and a final 1104 bp 


fragment of the cytB gene was amplified and sequenced as described previously (14, 15). 
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Phylogenetic analyses. This study generated a total of 240 CoV RdRP sequences (402 bp) 
from Kenyan bats. These sequences were first aligned in MAFFT v7.013 (28), using amino 
acid sequences as a guide for the nucleotide sequence alignment. Phylogenetic trees were 
then inferred using the maximum likelihood (ML) method available in PhyML version 3.0 
(29) assuming a general time-reversible (GTR) model with a discrete gamma distributed rate 
variation among sites (4) and the SPR branch-swapping algorithm. To produce a more 
condensed data set, we clustered the highly similar sequences from the same geographic 
location and host species, and randomly chose one or two to represent each cluster. This 
condensed data set was subsequently combined with 121 reference sequences representative 
of the genetic diversity of alpha- and beta-CoVs on a global scale taken from GenBank. ML 
phylogenetic trees of these final alignments were inferred using the same procedure and 


substitution models as described above. 


Comparisons of viral genetic, geographic, and host genetic distance matrices. To 
determine the relationship between viral genetic, geographic, and host genetic distances, we 
compiled a data set containing the Kenyan CoV samples generated in this study. The genetic 
distance matrices were produced from pairwise comparisons either in the form of uncorrected 
percentage differences or calculated from the phylogenetic trees (patristic distance) using the 
Patristic v1.0 program (30) The geographic distances (Euclidean distance) were calculated 
using the formula “distance = (acos((sin(latitude1) * sin(latitude2)) + (cos(latitude1) * 
cos(latitude2) * cos(longitude2 - longitude1)))) * 6371”, with spatial coordinates of the 
samples derived from the geographic location information. 

We used Mantel correlation analyses to test the extent of the correlation between 


these matrices (31). Both simple Mantel’s test and partial Mantel’s test were performed, and 
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the correlation was evaluated with 10000 permutations. To access which of the two factors — 
geographic or host genetic distance — best explained total variation in the virus genetic 
distance matrices, we performed multiple linear regression on these distance matrices (32). 
The statistical significance of each regression was evaluated by performing 10000 
permutations. To examine whether the degree of virus genetic relatedness corresponded to 
the scale of geographic distance or host relatedness, we generated Mantel correlograms. In 
each correlogram, 10-12 distance classes were assigned following an equal-frequency 
criterion: each class had similar number of pairwise comparisons. All statistical analyses 
were performed using the Ecodist package implemented in R3.0.2 (33), and all statistical 


results were considered significant at the P = 0.05 level. 


Full genome sequencing and sequence analyses. Five viruses representative of the full 
diversity of the CoVs newly described here were selected for full genome sequencing: 
BtKY229E-1, BtK Y229E-8, BtK YNL63-9a, BtK YNL63-9b, and BtK YNL63-15. We first 
sequenced a number of conserved regions throughout the genome using several semi-nested 
or nested consensus degenerate RT-PCR amplicons. These regions were then bridged using 
sequence-specific RT-PCR followed by Sanger sequencing (< 2 kb) or using the PacBio 
platform (> 2 kb). The assembled consensus genome sequences from PacBio sequencing 
were later confirmed by sequence-specific RT-PCR and Sanger sequencing (GenBank 
accession numbers KY073744-KY073748). The 5’ and 3’ genome termini were not 
determined due to the limited RNA remaining, and were derived with PCR primers based on 
the conserved genome regions in alpha-CoVs. 

For each complete genome sequence, potential ORFs were predicted based on the 
conserved core sequence, 5’-CUAAAC-3’, with a minimum length of 66 amino acids. 


Ribosomal frameshifts were identified based on the presence of the conserved slippery 
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sequence, “UUUAAAC”. For phylogenetic analyses, the data set was first separated into six 
ORFs, namely; ORFla, ORF1b, Spike (S), Envelope (E), Membrane (M), and Nucleoprotein 
(N) genes. The data set for each gene was translated into amino acid sequences and aligned 
using MAFFT v7.013. Phylogenetic trees were then inferred using PhyML as described 
above. Recombination events were first identified from the occurrence of incongruent 
topologies in these initial phylogenies, and were then confirmed and characterized using 
Simplot v3.5.1 (34). In the Simplot analysis, seven sequences were analyzed, including the 
potential recombinant, the parental viruses, as well as an outgroup. The similarity 
comparisons of recombinant and the other sequences were plotted using a sliding window 


with a size of 1000 bp and a step size of 10 bp. 


RESULTS 

Prevalence of CoV in Kenyan bats. We examined bats from at least 27 species (17 genera) 
collected over a four year period (2007-2010) from 30 locations across the southern part of 
Kenya (Figure 1). A total of 2,050 bats samples were screened for CoV RNA using a pan- 
coronavirus RT-PCR assay. We found an overall prevalence of 11.7% (240/2,050 bats) 
(Table S1). This overall prevalence is in line with recent reports of CoVs in bats from 
numerous locations including South Africa, Mexico, Philippines, Kenya, United Kingdom, 
Japan, Italy, and Ghana (6, 14, 15, 35-40). 

Bats of the species tested (Chaerephon pumilus, Coleura afra, Lissonycteris 
angolensis, Miniopterus africanus, Neoromicia tenuipinnis, Neoromicia sp., Nycteris sp., 
Pipistrellus sp., and Scotoecus sp.) did not yield CoV positive samples although the sample 
number was limited and might not reflect the real prevalence (Table S1). Conversely, in bats 
of several other species the CoV prevalence was high (Cardioderma cor, 25%; Eidolon 


helvum, 21%; Epomophorus labiatus, 28.6%; Hipposideros sp., 27.6%; Miniopterus minor, 
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22.6%; Otomops martiensseni, 28.6%; Rhinolophus hildebrandtii, 31.3%; Rhinolophus sp., 
28.9%; Triaenops afer, 26.7%). Most species (21/27) were sampled at more than one 
location. Of note, we detected CoVs in 21% of E. helvum bats tested in Kenya, whereas a 
previous study in Ghana failed to detect any CoVs in a similar number of bats from this 


species (6). 


Phylogenetic diversity of Kenyan bat CoVs. The viral sequences identified in Kenyan bats 
showed a remarkable diversity within both alpha- and beta-CoVs (Figure 2). Based on our 
phylogenetic analysis, the CoVs newly identified here can be grouped into 20 phylogenetic 
lineages (Figure 2). Many of the sampled bat genera are associated with more than one viral 
lineage. Furthermore, in some cases, the divergence of the CoVs within the same host genera 
may also be associated with possible differences in sample types. For example, we found two 
lineages of CoV in Rousettus aegyptiacus bats, one of which was present in oral swabs 
(Figure 2: L7 Rousettus) while the other one was identified in fecal swabs (L17 Rousettus). 
The default tissue tropism for bat CoVs is believed to be intestinal and samples of choice are 
fecal swabs. In agreement with this, only four viruses were identified from oral swab samples 
(L7 Rousettus) as indicated in the phylogeny (Figure 2). 

Our phylogenetic analyses also revealed a number of cross-species transmission 
events at the genus level, many of which appeared to be transient spill-overs with no evidence 
of onward transmission. This pattern was observed as CoV sequences recovered from bats of 
a particular genus located as tree tips within the phylogenetic diversity that is mainly 
associated with a different bat genus. From our Kenyan data set, there were seven such cross- 
species transmission events in total, each represented by a single sequence (dotted red in 
Figure 2), suggesting these are most likely viruses with limited transmission within new hosts, 


although this hypothesis requires confirmation on a larger set of samples. 
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A more comprehensive and informative phylogeny (Figure 3) was obtained after 
including the representative global CoV sequences from GenBank, which also included the 
Kenyan viruses previously reported (15). The phylogeny, which included viral sequences 
recovered from bats of more than 50 species (30 genera), resulted in an accurate phylogenetic 
assignment of the viruses described in this study (Figure 3). Importantly, the newly 
discovered viruses from Kenya have greatly extended our previous work (15) in terms of: (i) 
expanding the diversity of existing lineages, including the Miniopterus, Rhinolophus, and 
Scotophilus associated CoV clusters in the genus A/phacoronavirus, and the Rousettus and 
Rhinolophus associated CoVs clusters in the genus Betacoronavirus; and (ii) the discovery of 
new viruses from either a novel bat host (i.e. Triaenops) or new divergent CoV clusters in 
known hosts (i.e. Rhinolophus, Rousettus, Chaerephon, etc) (Figure 3). 

The phylogeny suggests both ancient virus-host co-divergence and recent cross- 
species transmission of CoVs between bats and other mammalian hosts. The phylogeny 
clearly demonstrates that CoVs from two host groups, one dominated by bats and the other 
exclusively by non-chiropteran mammals, formed sister clades for both alpha- and beta-CoVs 
(Figure 3), suggestive of an ancient divergence between them. Conversely, several non- 
chiropteran CoVs are nested within the diversity of bat CoVs, suggesting that these viruses 
are relatively recent introductions from bats. These cross-species transmission events resulted 
in emergence of severe (SARS-CoV and MERS-CoV) and mild (HCoV-NL63 and HCoV- 
229E) human pathogens, as well as animal pathogens (Porcine epidemic diarrhea virus 
[PEDV] and Alpaca respiratory CoV). Interestingly, HCoV-NL63, previously thought to be 
related to North American tricolored bat (P. subflavus) (19), in our phylogeny is deeply 
nested within the newly identified CoVs from African Triaenops afer bats (Figure 3), while 
the P. subflavus virus (labeled green in Figure 3) grouped with a North American CoV 


sampled from a Myotis volans bat (Figure 3). Therefore, Triaenops afer bats likely represent 
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the most recent chiropteran reservoir host of viruses ancestral to HCoV-NL63. In addition, 
our results identified 16 additional 229E-like viruses (L14, Figure 2), providing further 
evidence that Hipposideros bats in Africa harbor viruses that are ancestral to HCoV-229E (5, 


6). 


Host and spatial dynamics of bat CoVs in Kenya. We used Mantel’s test to compare the 
virus and host genetic distance matrices, as well as virus and geographic distance matrices. 
Notably, the correlation values were positive and highly significant in both comparisons 
(Table 1), suggesting that both host and geography have shaped the structure of virus genetic 
diversity. This conclusion remained following partial Mantel analyses and multiple linear 
regression analyses in which we tested the effect between two matrices while controlling for 
the third (Tables 1 and 2). Importantly, however, in both simple and partial Mantel analyses, 
the virus genetic distance matrices had much higher correlation with host genetic distance 
matrices than with geographic distance matrices (Table 1), indicating that bat CoV diversity 
is more structured by host than by geographic distance. 

Next, we used Mantel autocorrelograms to examine the effect of (i) geographic 
distance (Figure 4A) and (ii) host genetic distance on virus diversity (Figure 4B). Host 
genetic distance decreased from highly significantly positive at short taxonomic distances to 
highly significantly negative at long distances. Importantly, the crossing-over point was at a 
host genetic distance of around 0.15-0.19, which marks the boundary of intra- and inter- 
genera host diversity (Figure 4B). However, no obvious clinal patterns in geographic distance 


were observed within the Kenyan data set. 


Full genome characterization and recombination analyses of NL63-like and 229E-like 


viruses. To further explore the evolution of the NL63-like and 229E-like viruses, we 
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generated the complete genome sequences of five representative bat-derived CoVs: three 
(BtK YNL63-9a, BtK YNL63-9b, and BtK YNL63-15) were from the NL63-like group and 
two (BtK Y229E-1 and BtK Y229E-8) from the 229E-like group (L12-L14, Figure 2). For all 
the viruses newly described here, the genome structures follow an identical ORF 
arrangement: ORF lab-S-ORF4-E-M-N-ORFS8 in 229E-related viruses and ORF 1lab-S-ORF3- 
E-M-N-ORFx in NL63-related viruses (Figure 5, Tables 3 and 4). The additional 
ORF8/ORFx was identified at the 3’ end of the genome in all bat NL63-like and 229E-like 
viruses characterized in this study, although it was missing in both human viruses (HCoV- 
229E and HCoV-NL63). The ORF8 in bat 229E-like genomes is named in analogy with the 
ORF8 of Ghanaian bat and dromedary 229E-like CoVs (5, 12). The ORF8 of BtK Y229E-1 
shared 60% protein identity with its closest relatives while BtK Y229E-8 has a shorter and 
highly divergent ORF8. The ORFx of NL63-like viruses shared very low identity (21-33% at 
the amino acid level). Similarly to the bat 229E-like CoVs recently discovered in Ghana (5), 
the S genes in our bat 229E-like CoVs have a considerably longer 5’ S1 portion (additional 
185 amino acids) compared to HCoV-229E and alpaca and dromedary 229E viruses (12). 

For comparison, we also included 21 genome sequences representative of the 
diversity in the genus A/phacoronavirus. The phylogeny based on the ORF 1b protein 
alignment confirmed that NL63-like and 229E-like groups are monophyletic (Figure 6). 
Given that each group is associated with a specific bat genus, it is likely that the ORF1b 
genes of the human viruses (i.e. HCoV-NL63 and HCoV-229E) were ultimately derived from 
Triaenops-associated CoVs and Hipposideros-associated CoVs, respectively. The 
relationship between Hipposideros bat CoVs and HCoV-229E was also demonstrated by 
Corman et al. (5) based on specimens obtained in Ghana. Compared to the viruses described 
in that study, the newly identified Kenyan viruses (BtKY229E-1 and BtK Y229E-8) were 


among those more distantly related to HCoV-229E (Figure 6 and Table 3). As for the NL63- 
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like group, HCoV-NL63 was nested within the diversity of three lineages of Triaenops- 
associated CoVs, among which BtK YNL63-9a showed the closest relationship in all genome 
regions with the exception of the S gene (Figure 6 and Table 3). 

Strikingly, the phylogeny of the S protein suggested an entirely different evolutionary 
history for HCoV-NL63 compared to the rest of the genome (Figure 6). Specifically, for all 
the proteins with the exception of S, HCoV-NL63 clustered with the NL63-like group. 
However, in the S protein, HCoV-NL63 was deeply nested within the 229E-like group, 
associated exclusively with viruses from Hipposideros bats, and particularly similar to the 
sequences BtK Y229E-1 and BtK Y229E-8 newly identified during this study (Figure 6). 
Interestingly, BK Y229E-1 exhibited the closest resemblance to HCoV-NL63 in the receptor 
binding domain (RBD, (41)), especially in the three receptor binding motifs (RBM), whereas 
other viruses exhibited less similarity in these regions (Figure 7A). A phylogeny based on 
the RBD region confirmed our observation (Figure 7B), although it remains uncertain 
whether these bat viruses utilize the same host cell receptor. 

To further characterize this recombination event, we performed genome-scale 
similarity comparisons between HCoV-NL63 and related viruses (Figure 8). The analysis 
confirmed the chimeric nature of the HCoV-NL63 genome with only the spike protein 
involved in recombination via two break-points: one located near the 5’ end of the S gene and 
the other at around 200 nucleotides upstream of the 3’ end. To exclude the possibility of any 
artificial recombination, the break-point was further confirmed by RT-PCR and Sanger 
sequencing, using a single amplicon to cover each break-point. Collectively, these data show 
that HCoV-NL63 evolved from a recombination event between CoVs from the NL63-like 
and 229E-like groups. 

In addition to HCoV-NL63, we identified a number of other recombination events 


between divergent CoVs involving the S gene. One example is the BK YNL63-15 newly 
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identified here. Throughout the genome, BtK YNL63-15 showed strong similarity (79% - 
99% protein identities in the ORFlab, ORF4, M, E, and N genes) with BtK YNL63-9b. In 
contrast, the genetic identity between S protein sequences of these viruses was only 53%. In 
the S protein phylogeny, BtK YNL63-15 did not cluster with NL63-like viruses but instead 
clustered with Miniopterus bat CoV HKU8 and Chaerophon bat CoV KY22 (Figure 6). 
Interestingly, HKU8 itself is a recombinant in the S gene region (Figure 6). These results 
suggest that the spike protein of CoVs is subject to relatively frequent recombination even 


between divergent viruses. 


DISCUSSION 


In this study we significantly extended existing knowledge on CoV diversity, their 
association with specific bat species, the relatedness between bat and human CoVs, and 
natural recombination events in the CoV spike (S) protein gene between viruses from 
different lineages. 

Notably, we found that host species poses a greater influence on CoV diversity in bats 
than the geographic distance, which can be explained by the ability of bats to fly (including 
long-distance migrations typical for some species) and disperse their pathogens over vast 
territories (42). A closer inspection of the Mantel correlogram suggests the presence of less 
structured (homogenous, mantel statistic r>0), and highly structured (mantel statistic r<0) 
diversity which, strikingly, corresponds to the division between intra-genera (10% ~ 20%) 
and inter-genera (> 20%) host genetic distances (Figure 4B). This suggests that within-genus 
virus transmissions occur significantly more frequently than between-genera transmissions, 
which is consistent with the previous observations that phylogenetic clustering is less 
constrained at the host species level than at the genus level (16, 43). While it is commonly 


accepted that host phylogeny constrains virus cross-species transmission to some extent (44), 
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the stronger demarcation at the genus level is of particular interest. In fact, bats of different 
species, genera, and families frequently roost together (in caves, tree holes, and other 
shelters), sometimes in dense aggregations, which provide abundant opportunity for 
mechanical transmission of pathogens between host species. Therefore, our data suggests that 


distinctions between bats at the genus level might mark a threshold where the differences in 


cellular and immunological environment become a major challenge for a virus to switch hosts. 


This, in turn, will lead to the pattern of ‘preferential host switching’ that has been observed in 
a number of other viruses (45). 

The detection of distinctive HCoV-NL63-like and HCoV-229E-like sequences in bats 
sheds new light on CoV evolution. In particular, we provide strong evidence that HCoV- 
NL63 has a zoonotic recombinant origin. Although the majority of the HCoV-NL63 genome 
originates from the viruses circulating in Triaenops afer bats, its spike protein gene is derived 
from a 229E-like virus circulating in Hipposideros spp. bats. However, despite the strong 
signal for recombination, both putative parental strains show substantial genetic distances 
from human CoVs. This most likely reflects extensive post-recombination sequence 
divergence, which in turn suggests that the recombination event has occurred prior to the 
emergence of HCoV-NL63 in humans. 

Most of the recombination events reported here involve breakpoints around the S 
gene. Indeed, similar breakpoints are also reported for SARS-CoV and SARS-like CoVs (24, 
25), HCoV-OC43 (26), and a feline CoV (46) such that it is seemingly a recombination 
‘hotspot’ in many CoVs. It has been argued that a strong secondary structure between ORF la 
and S gene may promote transcriptional pulsing, facilitating recombination (47). However, 
there is also evidence that this recombination hotspot does not exist under non-selective 


conditions (48), such that it may reflect the successful spread of beneficial recombinants 
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rather than an elevated rate of recombination per se. This hypothesis is supported by the fact 
that the spike protein is intimately involved in the interaction with the host immune system. 

Importantly, our results also revealed that recombination has resulted in similar S 
proteins in the two human viruses HCoV-NL63 and HCoV-229E, such that acquisition of a 
229E-like S protein may have contributed to the emergence of NL63-like viruses in humans. 
However, despite this similarity of S protein sequences, these two human viruses utilize 
different receptors (ACE2 and aminopeptidase-N for HCoV-NL63 and HCoV-229E, 
respectively) to enter human cells. Within the 229E-like group, the RBD of HCoV-NL63 is 
more closely related to BLK Y229E-8 than to HCoV-229E. The RBD of BtK Y229E-8 exhibits 
greater similarity with that of HCoV-NL63 (Figure 7), and is therefore more likely to be the 
prototype of RBD in HCoV-NL63. 

Until recently, most reported recombination events in CoVs involved viruses 
associated with closely related host species, although recombination between highly 
divergent CoVs has been demonstrated experimentally (49-51). The apparent lack of 
interspecies recombination under natural conditions is most likely due to the insufficient 
collection of complete genome sequences that are truly representative of coronavirus 
diversity. Indeed, a number of viruses, such as HKU2, display phylogenetic incongruence 
across different parts of the genome (52), although the lack of one of the putative parental 
strains has prevented clear identification of a recombinant history. 

Finally, our study provides insights into the evolution history of CoVs. Although it is 
unclear whether bats are direct ancestors of all alpha- or beta-CoVs due to the presence of 
non-bat CoV clades at the basal phylogenetic positions of both genera (Figure 3), bat-borne 
CoVs constitute a substantial part of the diversities of alpha- or beta-CoVs. In addition, six 
lineages of non-bat CoVs are nested within the bat-borne clades. These likely represent 


independent and successful adaptations via shifts from the progenitor reservoir species (bats) 
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to other mammals. Four well-characterized human CoVs lie within these clades. However, it 
is worth noting that bats may not have directly transmitted the viruses to humans. Indeed, 
HCoV-229E is more closely related to viruses circulating in camels than those in bats, 
suggesting that camels may be intermediate hosts between bats and humans (12). Similarly, 
other human CoVs such as SARS-CoV and MERS-CoV all use terrestrial mammals rather 
than bats as intermediate hosts, which have an increased chance of contact with humans. This 


underlines a typical zoonotic link of bat-associated CoV to humans via terrestrial mammals. 
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Figure Legends 


Figure 1. Map of Kenya showing the geographic locations of 30 bat collection sites. 


Figure 2. Phylogeny of RdRp of all CoVs discovered in this study. The host (bat genus), 
number of sequences, and operational classification (lineage) are shown on the right of the 
tree. Branches that represent the minority host genera within the lineage defined by a single 
dominant host genus are marked with red and labeled with solid circle. The tree is mid-point 


rooted for clarity only and support values are only shown for internal branches. 


Figure 3. Phylogenies of RdRp of alphacoronaviruses and betacoronaviruses. The trees 
are inferred using representative CoV sequences from this study as well as those obtained 
from the GenBank. The sequences are labeled with accession number/strain name, host 
(species) and geographic origin (three letter country code). Different colors are used to 
distinguish the following groups: Kenyan bat CoVs discovered during this study (orange), 
CoVs identified from non-bat mammals (blue), the Perimyotis subflavus virus previously 
reported to be related to HCoV-NL63 (green), and the remaining bat viruses (black). The 
lineage information for Kenyan CoVs is shown to the right of the phylogeny and matches that 


in Figure 2. 


Figure 4. Mantel correlograms showing the Kenyan bat CoV RdRp sequences stratified 
by (A) geographic distances and (B) host genetic distances. A Mantel correlation index (r) 
was calculated for each of the distance classes. Under the null hypothesis of no relationship 
between the distance matrices, r values would be close to zero. Positive r values suggest 
lower genetic distances between case pairs, whereas negative r values suggest higher genetic 
distances between case pairs. Solid dots: r significantly different from zero; hollow dots: r not 


significantly different from zero. The graph (B) also shows kernel density plots for intra- 
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genus host distances density (grey solid line) and inter-genus host distances density (grey 
dotted line). The corresponding y-axis for the plot is shown on the right of the figure (B). The 
grey box in between the two plots marked the transition area between the intra-genus and 


inter-genus host genetic distances 


Figure 5. Genome organization of 2 bat 229E-like and 3 bat NL63-like viruses sampled 
from Kenyan bats. A unified length scale is used for all the genomes. Within each genome, 
the ORFs (arrow boxes) and ribosomal frame shift sites (vertical lines) are indicated at their 


corresponding positions. 


Figure 6. Phylogenetic analyses of major open reading frames of NL63-like and 229E- 
like CoVs in the context of alphacoronaviruses revealing evidence of recombination. 
Viruses sequenced in this study are shown in orange. Three potential recombinant genomes 
of HCoV-NL63, BtK YNL63-15, and HK U8 are indicated with red circles, blue triangles, and 


brown squares. 


Figure 7. The relationships between HCoV-NL63 and related viruses at the receptor 
binding domain. (A) Alignment of NL63-like and 229E-like viruses and related viruses at 
the receptor binding domain. The positions of three receptor binding motifs (RBMs) are 
marked with double arrowed line. Residues in the NL63-CoV RBMs that directly contact the 
ACE2 receptor are marked with red downward arrows. (B) Phylogenetic relationships of 
NL63-like and 229E-like viruses at receptor binding domain of HCoV-NL63. The tree is 


based on an amino acid alignment and mid-point rooted. 
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Figure 8. Recombination analyses of HCoV-NL63 using Simplot. Genome-scale similarity 
comparisons of HCoV-NL63 (query) against Btk YNL63-9a (major parental group, blue), 
BtK YNL63-9b (green), BtK Y229E-8 (minor parental group, red), HCoV-229E (orange), 
BtCoV/FO1A-F2/Hip_aba/GHA/2010 (pink), and Alaca respiratory CoV (brown). A full 
genome structure, with reference to HCoV-NL63, is shown above the similarity plot, marking 
the positions and boundaries of the major open reading frames. At the beginning of the S 
gene, the flat-line followed by a sudden drop of similarity is due to a gap (deletion within 


HCoV-229E S gene) in the alignment. 


Tables 


Table 1. Results of Mantel tests and partial Mantel tests comparing two factors (host genetic 
distance and geographic distance) that predict the structure of virus genetic diversity 


Model r value for Kenyan bats (P value) 
Host* 0.5265 (P < 0.0001)° 
Host | Geography” 0.5055 (P < 0.0001)° 
Geography* 0.2122 (P < 0.0001)° 
Geography | Host? 0.1285 (P = 0.0005)° 


“Mantel test; °partial Mantel test; “significant at 0.001. 


Table 2. Multiple regression of virus genetic distance against host genetic distance and 
geographic distance in Kenyan bat CoVs (2007-2010) 


Variable Correlation coefficient P-value 

Host 7.58E-01 1.00E-04 

Geography 1.19E-06 1.00E-02 
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681 Table 3. Sequence comparisons of the Kenyan bat CoVs with HCoV-229E or HCoV-NL63 


Genome Concatenated 
identity domains ADRP nsp5 nsp12 nspl3 nspl4 nspI5 nspl6 lab S ORF3/4 
Nucleotide % Amino acid % identity to HCoV-229E 
BtKY229E-1 88 98 92 98 97 99 97 96 94 95 75 92 
BtKY229E-8 88 97 89 98 98 98 97 97 94 96 «74 94 
Nucleotide % Amino acid % identity to HCoV-NL63 
BtKYNL63-9a 78 91 75 89 93 94 89 88 94 86 8653 67 
BtKYNL63-9b 68 83 51 76 88 91 82 81 84 72 $2 55 
BtKYNL63-15 68 84 51 76 88 91 82 81 87 72) 49) 55 
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Table 4. Genomic features of the open reading frames (ORF) in the Kenyan bat CoVs and their putative transcription regulatory sequences (TRS). 


Vitis 229E-like virus NL63-like virus 
ne BiKY229E-1 BiKY229E-8 BiKYNLO3-9a BiKYNL63-9b BiKYNL63-15 

Sequences (nt) 27837 27666 28363 28677 28479 

GC% 38 39 39 B B 
ORF size 

ities | tan 20286 20304 20277 20349 20355 

(nt) tative | TCTCAACTAAAC(N219) | TCTCAACTAAAC(N219) |, CTCAACTAAAC(N215)AU | TCTCAACTAAAC(N215) 
eRe on AUG CAACTAAAC(N214)AUG | & Aue 
a SIZE | 4095 4095 4119 4122 4134 

s : 
ao ae MATAAAA | UCTCAACUAAA(4)AUG | TCAACTAAAC(NI)AUG CTCAACTAAAUG TCAACTAAAC(N1)AUG 
ORF size 681 684 684 684 684 

orF3/4 |) 
rl a eee TCAACTAAAC(N37)AUG_ | TCAACTAAAC(N37)AUG | TCAACTAAAC(N37)AUG | CAACUAAAC(N37)AUG 
a Ske | O34 234 234 234 234 

E ; 

utative | TCTCAACTAAACINIS1) | TCTTCAATGTAAC(N281) | trata AC(N79)AUG TCTGCTAAC(NISI)AUG | TCTGATAAC(N151)AUG 

TRS AUG AUG 
a SHE) 681 681 693 681 684 

M - 
ely ay MACTAAAC(NA)AU | Ora AACTAAAC(N4)AUG | CTAAAC(N6)AUG TCTAAACTAAAC(N4)AUG aaa eee 
ce siz | 1161 1200 1225 1302 1302 

N : 
ale - eudabae DA | ATCTAAAC(NIL)AUG TCTAAACTAAAC(N3)AUG | CTAAACCAAAC(N4)AUG rere aia a 
ORF size 

2 
Gana (lan 288 198 287 291 270 
ORES ae UCAACUAAAAC(I)AUG | UCAACUAAAAC(4)AUG | CAAAACCUAAC(NI12)AUG | TCAACTAAAC(NS67)AUG_ | CAACUAAAC(N234)AUG 
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Alpha-CoV 


95 | HQ728484 KEN Miniopterus sp 
BtKY195 Miniopterus minor L14 
99 L BtkY130 Miniopterus minor 
92 GU190240 BGR Miniopterus schreibersii 
DQ666337 CHN Miniopterus magnater 
99 BtKY258 Miniopterus minor 
95 BtKY224 Miniopterus minor L2 
9249228 CHN Miniopterus pusillus 
DQ666339 CHN Miniopterus magnater 
ES, EU834956 AUT Miniopterus australis 
KF515987 NZL Mystacina tuberculata 
EU834951 AUT Myotis macropus 
HQ184058 ESP Pipistrellus kuhlii 
KJ473809 CHN Nyctalus velutinus 
KF843855 ZAF Neoromicia cf capensis 
GU190239 BGR Nyctalus leisleri 


91 Pt 


95 2. as 
KT345294 FRA Pipistrellus pipistrellus 
ue JQ731775 CRI Anoura geoffroyi 
JQ731784 PAN Artibeus jamaicensis 
99 - HQ728480 KEN Cardioderma cor 
90 BtKY242 Cardiodermacor L4@ 
BtKY236 Rhinolophus landeri L3 
99 |b. 100 GU190233 BGR Rhinolophus fer. 
P BtKY244 Rhinolophus hilderbrandtii L5-6 
BtKY70 Rhinolophus sp ‘i 
KU343197 CHN Rhinolophus affinis 
DQ648854 CHN Rhinolophus sp 
99 BtKY117 Rousettus aegyptiacus L7 
JQ989272 CHN Hipposideros sp 
JQ989270 CHN Rousettus sp 
91 Dd48823 CHN Scotophilus kuhlii 
87 BtKY280 Scotophilus dingani L11 
KF569988 CHN Myotis davidii 
37 KF294382 CHN Myotis davidii 
JF440355 GBR Myotis nattereri 
JF440350 GBR Myotis daubentonii 
af EU375868 DEU Pipistrellus pygmaeus 


g7 '— £U375864 DEU Pipistrellus nathusii 
HM368166 DEU Myotis myotis 
DQ249224 CHN Myotis ricketti 
EF544565 USA Myotis occultus 
EF185992 Porcine epidemic diarrhea virus 
90 KU182966 CHN Murina leucogaster 
L 97 KF 294376 CHN Murina leucogaster 
p3 a EF544566 USA Eptesicus fuscus 
JX537914 USA Eptesicus fuscus 
98 JQ731799 BRA Molossus rufus 
KF569991 CHN Myotis davidii 
KJ473806 CHN Myotis ricketti 
94 HQ184050 ESP Myotis blythii 
90 - HG336976 USA Myotis volans 
JX537913 USA Perimyotis subflavus 
KC110771 BRA NA 
92 BtKY273 Otomops martiensseni L8 
BtKY147 Chaerophon sp 
98 HQ728486 KEN Chaerophon sp 
98 BtKY275 Otomops martiensseni 1_9.40 
88), BtKY270 Chaerephon sp 
823 BtKY204 Epomophorus labiatus 
5 - JQ410000 Alpaca respiratory coronavirus 
BtKY229E-1 Hipposideros sp 


NC002645 Human coronavirus 229E L14 


94  BtKY229E-8 Hipposideros vittatus 
JX174639 GAB Hipposideros caffer 
100 | « KT253270 GHA Hipposideros abae 
FJ710045 GHA Hipposideros sp 
92 FJ710044 GHA Hipposideros sp 
97 NC005831 Human coronavirus NL63 
BtKYNL63-9a Triaenops afer 


BtKYNL63-15 Triaenops afer L12-13 


BtKYNL63-9b Triaenops afer 
99 170728481 KEN Chaerophon sp 
BtKY210 Chaerephon sp 
JQ731790 CRI Carollia perspicillata 
JQ731788 PAN Artibeus lituratus 


96 JQ731782 PAN Phyllostomus discolor 


EU769558 TTO Glossophaga soricina 


Beta-CoV 


97 


GQ153539 CHN Rhinolophus sinicus 

DQ412042 CHN Rhinolophus ferrumequinum 

DQ412043 CHN Rhinolophus macrotis 

AY304488 SARS Coronavirus SZ16 

DQ071615 CHN Rhinolophus pearsoni 

BtKY237 Rhinolophus hilderbrandtii L416 

GU190227 BGR Rhinolophus mehelyi 

GU190231 BGR Rhinolophus ferrumequinum 

(Q404795 SVN Rhinolophus hipposideros 

EU834950 AUT Rhinonycteris aurantius 

KU343200 CHN Hipposideros pomona 
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KC545383 DEU Erinaceus europaeus 
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KU740200 MERS CoV/camel/Egypt/NRCE-NC163/2014 

9 ' JX869059 MERS Coronavirus EMC2012 
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KU182965 CHN Myotis daubentonii 

DQ648819 CHN Pipistrellus pipistrellus 
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BtKY92 Eidolon helvum 

94 | HQ728482 KEN Eidonlon sp 

BtKY89 Eidolon helvum L19 
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BtKY54 Epomophorus labiatus 

BtKY182 Epomophorus labiatus 

BtKY55 Epomophorus labiatus 

BtKY234 Epomophorus labiatus 

98] 89 — AB539081 PHL Cynopterus brachyotis 

31 KU182962 CHN Cynopterus sphinx 

AB683970 PHL Ptenochirus jagori 

AB918719 IDN Dobsonia moluccensis 

HM211100 CHN Rousettus leschenaulti 

BtKY76 Rousettus aegyptiacus 17 

BtKY221 Rousettus aegyptiacus 

96 | Q728483 KEN Rousettus sp 
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EF065516 CHN Rousettus leschenaultii 
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Murine hepatitis virus, Bovine coronavirus, 
Rabbit coronavirus, OC43 etc 
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Miniopterus Bat CoV 1A 

Miniopterus bat CoV-Kenya-KY27-2006 
Miniopterus bat CoV-Kenya-KY33-2006 
BB Miniopterus bat CoV HKU8 

100 , Hipposideros bat CoV HKU10 
Rousettus bat CoV HKU10 
Cardioderma bat CoV-Kenya-KY43-2006 
Chaerephon bat CoV-Kenya-KY22-2006 
Porcine epidemic diarrhea virus 
Scotophilus bat CoV 512 
Chaerephon bat CoV-Kenya-KY41-2006 
Bat CoV CDPHE15-USA-2006 
Alpaca respiratory CoV 

Human CoV 229E 

BtCoV/FO1A-F2/Hip aba/GHA/2010 
BtCoV/KW2E-F 151/Hip cf rub/GHA/2011 
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ABIKYNL63-15 
BtKYNL63-9b 
Transmissible gastroenteritis virus 
Feline CoV 1 

Feline CoV 2 

Mink CoV strain WD1127 


<1 
0.1 
1a E 

100 BiKkY229E-8 FO1A-F2 
BtKY229E-1 95 Hl Alpaca CoV 

2h Alpaca CoV KW2E-F151 

7) HCoV 229E BIKY229E-1 
AT1A-F1 BtKY229E-8 
KW2E-F151 HCoV 229E 
FO1A-F2 AT1A-F1 
KW2E-F56 87 L KW2E-F56 
@ HCoV NL63 A BIKYNL63-15 


BtKYNL63-9a 
A BIKYNL63-15 
BtKYNL63-9b 


BtKYNL63-9b 
@ HCoV NL63 
BtKYNL63-9a 


BtCoV/KW2E-FS6/Hip cf rub/GHA/2011] 729E-like 
BICOV/AT1A-F1/Hip aba/GHA/2010 
BIKY229E-1 
BIKY229E-8 
BIKYNL63-9a 
@ Human CoV NL63 NL63-like 


Spike Protein (non-recombinant region) 


99 >— Miniopterus Bat CoV 1A 
Miniopterus Bat CoV 1B 

Miniopterus bat CoV-Kenya-KY27-2006 
Miniopterus bat CoV-Kenya-KY33-2006 
Hipposideros bat CoV HKU10 

Rousettus bat CoV HKU10 

Cardioderma bat CoV-Kenya-KY43-2006 
A BIKYNL63-15 

BB Miniopterus bat CoV HKU8 
Chaerephon bat CoV-Kenya-KY22-2006 
100 - Human CoV 229E 

Alpaca respiratory CoV 
BtCoV/FO1A-F2/Hip aba/GHA/2010 

99 © BtCoV/KW2E-F 151/Hip cf rub/GHA/2011 
BtKY229E-1 

@ Human CoV NL63 
BtKY229E-8 

BtCoV/AT1A-F1/Hip aba/GHA/2010 
BtCoV/KW2E-F56/Hip cf rub/GHA/2011 
BtKYNL63-9a F 
BtKYNL63-9b NEGsalke 
Porcine epidemic diarrhea virus 
Scotophilus bat CoV 512 

Chaerephon bat CoV-Kenya-KY41-2006 
Bat CoV CDPHE15-USA-2006 

Feline CoV 1 

Transmissible gastroenteritis virus 

Mink CoV strain WD1127 

Feline CoV 2 


Alpaca CoV 


HCoV 229E 
89 F Alpaca CoV 
KW2E-F 151 
FO1A-F2 
AT1A-F1 
KW2E-F56 
BtKY229E-1 
BtKY229E-8 

@ HCoV NL63 
BtKYNL63-9a 
A BIKYNL63-15 
BtKYNL63-9b 


BtKY229E-8 
FO1A-F2 
KW2E-F151 
BtKY229E-1 
KW2E-F56 
100+ AT1A-F1 
HCoV 229E 
A BIKYNL63-15 
BtKYNL63-9b 
@ HCovV NL63 
93 __ BtKYNL63-9a 
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KH KH 
0.1 0.1 


229E-like 
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HCoV-NL63 

BtKY229E-1 

BtKY229E-8 

HCoV-229E 

AlpacaCoV 

BtCoV/KW 2E-F151/Hip_cf_rub/GHA 
BtCoV/FO1A-F2/Hip_aba/GHA/2010 
BtCoV/KW 2E-F56/Hip_cf_rub/GHA/ 
BtCoV/AT1A-F 1/Hip_aba/GHA/2010 
BtKYNL63-15 

BtKYNL63-9a 


HCoV-NL63 

BtKY229E-1 

BtKY229E-8 

HCoV-229E 

AlpacaCoV 

BtCoV/KW 2E-F151/Hip_cf_rub/GHA 
BtCoV/FO1A-F2/Hip_aba/GHA/2010 
BtCoV/KW 2E-F56/Hip_cf_rub/GHA/ 
BtCoV/AT1A-F1/Hip_aba/GHA/2010 
BtKYNL63-15 

BtKYNL63-9a 


0.2 


ely NMUTIUT------ AIF Ge]PlengT 
QH SN | BBY VNEMPP RS TETASc ye 
asGciic FN 
QSG\GTC YN 
QSGSGTCFN 
QSGSGTCFN 


KHTS!IDLYVDFK Pe 
KHTSVGLYVD FK Pee 
RH THVD LY VDF E Pema SG .GSCWN 
RH SNVSLY V@F hPa SG SGSCWN 
DIDEINAUT TED WTiaS - - -GQQC ATSQP-H1I SI TAPIICERDMRMEVE) - - AVDE|AFCAYSHIK QiIPLLN AHD - - LPTEQIGH aK )Y 
DTEINAUT RAVEN T - - - YESETCTTKPDH- - --VTSCQY}RYUMEV, 


MLIMSGHIC PFSFIy 
MLIWITGNC PF SF) 
fal ESGhic RSF) 
SINTGNCPFSFE 
SINTGNCPFSFE@ 
SVD TGNC PFSF@ 
S|IDTGNCPFSF@ 
Gi! (\Gpc PFSFi) 
fy D/\GDC PF SFI) 
MLESGDCPFSFI§ 
SVESGSC PFNFI|§ 


TWy VYSRIVG LYVSWS EGER !| TGVPKE! WG 
LNNF@IKFGTLCFSLRE! PGGCNMPLI/\SW GLNQ@FZG LYLSWTEGD)! TGVPKEVyG 
VNNFVKFGSVCIFSLKD! PGGC/\MPI V).\NW \YEIdY YT! GSLYVSWSDGD@qI TGVPQPVEG 
VNNFVKFGSVCFSLKD 1! PGGC/\MP 1 V/\NF YINGYTIGSLYVSWSDGD@! TEVPKPVEG 
VNNFVKFGSVCEFSLKD | PGGC/\MP I V/.\NW Bhar | Me Avewacmoll Givecevel 
VNNFVKFGSVCFSLKD | PGGC/\MP I V/\NW/\Y IL@YYTIGSLYVSWSDGDIN1 TGVPEPVEG 
LNNFIKFGSICFSTKigI —(NGGC SMPI I/\SY \@INBYTIGSLEVSWIDGDINVSGVPKYVEG 
LNNFLKFGSICFESTR@I BGGC SMP! |/\SWSE@INBYTIGSLYVSWTIDGDINVSGVPRP!QG 
LN Y V§FG Tic @a)ke)_ PGGc/\1 Sv vsSEIen a1 G \vYVSYTHIGDA | Iicvp e@viie 
LUQN Y LUSFRISLCFSTK LPGGC SMM I STR RAN ee) | Gi! YVSiIsig Nit} | [8G V POLSSG 


99 HCoV-NL63 
BtKY229E-1 
BtKY229E-8 

96 


HCoV-229E 

AlpacaCoV 

BtCoV/KW 2E-F151/Hip cf rub/GHA/2011 
BtCoV/FO1A-F2/Hip aba/GHA/2010 
BtCoV/KW 2E-F56/Hip cf rub/GHA/2011 
BtCoV/AT1A-F1/Hip aba/GHA/2010 
BtKYNL63-15 

BtKYNL63-9a 
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rab SCS >> IND 


1.0 
0.9 Window: 1000 bp, Step: 10 bp, GapStrip: On, Kimura (2-parameter), T/t: 2.0 


0.8 


0.7 
0.6 
0.5 


fcr Pe, 


HCoV-NL63 vs BtKYNL63-9a 
HCoV-NL63 vs BtKYNL63-9b 

HCoV-NL63 vs BtKY229E-8 

HCoV-NL63 vs Human coronavirus 229E 
HCoV-NL63 vs BtCoV/FO1A-F2/Hip aba/GHA/2010 
HCoV-NL63 vs Alpaca respiratory coronavirus 


0.4 


0.3 


Genetic Similarity 


0.2 


0.1 


0.0 
0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000 20,000 22,000 24,000 26,000 28,000 


Position (bp) 
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